Increasingly Cautious Optimism for Practical PAC-MDP Exploration

نویسندگان

Liangpeng Zhang

Ke Tang

Xin Yao

چکیده

Exploration strategy is an essential part of learning agents in model-based Reinforcement Learning. R-MAX and V-MAX are PAC-MDP strategies proved to have polynomial sample complexity; yet, their exploration behavior tend to be overly cautious in practice. We propose the principle of Increasingly Cautious Optimism (ICO) to automatically cut off unnecessarily cautious exploration, and apply ICO to R-MAX and V-MAX, yielding two new strategies, namely Increasingly Cautious R-MAX (ICR) and Increasingly Cautious V-MAX (ICV). We prove that both ICR and ICV are PACMDP, and show that their improvement is guaranteed by a tighter sample complexity upper bound. Then, we demonstrate their significantly improved performance through empirical results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

V-MAX: tempered optimism for better PAC reinforcement learning

Recent advances in reinforcement learning have yielded several PAC-MDP algorithms that, using the principle of optimism in the face of uncertainty, are guaranteed to act nearoptimally with high probability on all but a polynomial number of samples. Unfortunately, many of these algorithms, such as R-MAX, perform poorly in practice because their initial exploration in each state, before the assoc...

متن کامل

Bounded Optimal Exploration in MDP

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoret...

متن کامل

PAC-MDP learning with knowledge-based admissible models

PAC-MDP algorithms approach the exploration-exploitation problem of reinforcement learning agents in an effective way which guarantees that with high probability, the algorithm performs near optimally for all but a polynomial number of steps. The performance of these algorithms can be further improved by incorporating domain knowledge to guide their learning process. In this paper we propose a ...

متن کامل

Probably Approximately Correct (PAC) Exploration in Reinforcement Learning

OF THE DISSERTATION Probably Approximately Correct (PAC) Exploration in Reinforcement Learning by Alexander L. Strehl Dissertation Director: Michael Littman Reinforcement Learning (RL) in finite state and action Markov Decision Processes is studied with an emphasis on the well-studied exploration problem. We provide a general RL framework that applies to all results in this thesis and to other ...

متن کامل

Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates

We present a new, efficient PAC optimal exploration algorithm that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic setti...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Increasingly Cautious Optimism for Practical PAC-MDP Exploration

نویسندگان

چکیده

منابع مشابه

V-MAX: tempered optimism for better PAC reinforcement learning

Bounded Optimal Exploration in MDP

PAC-MDP learning with knowledge-based admissible models

Probably Approximately Correct (PAC) Exploration in Reinforcement Learning

Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates

عنوان ژورنال:

اشتراک گذاری